A sparse octree gravitational N-body code that runs entirely on the GPU processor

نویسندگان

  • Jeroen Bédorf
  • Evghenii Gaburov
  • Simon Portegies Zwart
چکیده

We present the implementation and performance of a new gravitational N body tree-code that is specifically designed for the graphics processing unit (GPU) . All parts of the tree-code algorithm are executed on the GPU. We present algorithms for parallel construction and traversing of sparse octrees. These algorithms are implemented in CUDA and tested on NVIDIA GPUs, but they are portable to OpenCL and can easily be used on manycore devices from other manufacturers. This portability is achieved by using general parallel-scan and sort methods. The gravitational tree-code outperforms tuned CPU code during the tree-construction and shows a performance improvement of more than a factor 20 overall, resulting in a processing rate of more than 2.8 million particles per second.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Highly Parallel Surface Reconstruction

We present a parallel surface reconstruction algorithm that runs entirely on the GPU. Like existing implicit surface reconstruction methods, our algorithm first builds an octree for the given set of oriented points, then computes an implicit function over the space of the octree, and finally extracts an isosurface as a water-tight triangle mesh. A key component of our algorithm is a novel techn...

متن کامل

General purpose molecular dynamics simulations fully implemented on graphics processing units

Graphics processing units (GPUs), originally developed for rendering real-time effects in computer games, now provide unprecedented computational power for scientific applications. In this paper, we develop a general purpose molecular dynamics code that runs entirely on a single GPU. It is shown that our GPU implementation provides a performance equivalent to that of fast 30 processor core dist...

متن کامل

Fast Cellular Automata Implementation on Graphic Processor Unit (GPU) for Salt and Pepper Noise Removal

Noise removal operation is commonly applied as pre-processing step before subsequent image processing tasks due to the occurrence of noise during acquisition or transmission process. A common problem in imaging systems by using CMOS or CCD sensors is appearance of  the salt and pepper noise. This paper presents Cellular Automata (CA) framework for noise removal of distorted image by the salt an...

متن کامل

An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-Body Algorithm

This chapter describes the first CUDA implementation of the classical Barnes Hut n-body algorithm that runs entirely on the GPU. Unlike most other CUDA programs, our code builds an irregular treebased data structure and performs complex traversals on it. It consists of six GPU kernels. The kernels are optimized to minimize memory accesses and thread divergence and are fully parallelized within ...

متن کامل

Bonsai: A GPU Tree-Code

We present a gravitational hierarchical N-body code that is designed to run efficiently on Graphics Processing Units (GPUs). All parts of the algorithm are exectued on the GPU which eliminates the need for data transfer between the Central Processing Unit (CPU) and the GPU. Our tests indicate that the gravitational tree-code outperforms tuned CPU code for all parts of the algorithm and show an ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Comput. Physics

دوره 231  شماره 

صفحات  -

تاریخ انتشار 2012